Skip to content

feat(security): SSO/JWT authentication migration (Phase 1)#1569

Draft
jsell-rh wants to merge 62 commits into
mainfrom
jsell/spec/sso-authentication
Draft

feat(security): SSO/JWT authentication migration (Phase 1)#1569
jsell-rh wants to merge 62 commits into
mainfrom
jsell/spec/sso-authentication

Conversation

@jsell-rh
Copy link
Copy Markdown
Contributor

@jsell-rh jsell-rh commented May 12, 2026

Summary

SSO authentication migration — replaces OpenShift OAuth proxy with direct OIDC authentication via Keycloak. Includes a validated end-to-end native SSO deployment path (api-server + control plane) that bypasses the legacy backend and operator entirely.

What's implemented

  • Frontend BFF OIDC: Next.js acts as confidential OIDC client. Browser gets httpOnly session cookie, never sees a raw JWT. Authorization Code Flow with PKCE. Transparent token refresh (5-min access tokens, 30-min sessions).

  • Backend JWT validation: JWKS-based validation via lestrrat-go/jwx/v2. Validates signature, expiration, issuer, and audience.

  • K8s impersonation: Backend SA + Impersonate-User/Impersonate-Group headers preserve all existing RBAC enforcement without cluster OIDC federation. Prefers preferred_username over email for impersonation (matches Keycloak identity brokering claim order). Defaults to system:authenticated group when JWT has no groups.

  • Dual-path auth: JWT validation first, K8s TokenReview fallback for API keys (SA tokens). Both paths use impersonation.

  • SSO_ENABLED env var: SSO can be enabled via env var without depending on Unleash feature flag.

  • API server JWK_CERT_URL: Production environment no longer hardcodes the JWKS URL. Configurable via JWK_CERT_URL env var or --jwk-cert-url CLI flag, enabling non-RH-SSO OIDC providers (e.g., Keycloak).

  • Session expired UX: Global 401 detection via React Query cache, blocking dialog with login redirect, no retry storms.

  • E2E test auth: client_credentials grant from Keycloak with K8s SA fallback.

  • Local Keycloak: Kind cluster includes Keycloak with pre-configured realm, dev users, and protocol mappers. make kind-sso-toggle switches between SSO and legacy mode.

  • Native SSO deployment workflow: Complete workflow for deploying with Keycloak SSO, validated on hcmais cluster. See workflows/deploy-native-sso.md.

  • Migration ordering fix: roleBindings typedFKMigration ID changed from 202505130001 to 202603100139 to sort after the table-creation migration it depends on (202603100138 creates the role_bindings table; the old ID sorted before it, breaking fresh databases).

    Production DB impact verified: the ambient-code staging RDS already has 202505130001 in its migrations table. Shipping 202603100139 means gormigrate treats it as a new migration and runs it. Every statement uses IF EXISTS/IF NOT EXISTS/DROP NOT NULL (all idempotent), so all operations no-op on production. The migrations table will contain both IDs — redundant but harmless. Fresh databases (Kind, new namespaces) will only have 202603100139 in the correct sort position.

Deployment overlays

Overlays reorganized by cluster under components/manifests/overlays/hcmais/:

Overlay Purpose
hcmais/jsell-sso-poc/ Native SSO ambient deployment (api-server + control plane, no oauth-proxy)
hcmais/keycloak/ Keycloak instance deployment

Key files

Area Files
Spec specs/security/sso-authentication.spec.md
Deployment workflow workflows/deploy-native-sso.md
API server JWKS config components/ambient-api-server/cmd/ambient-api-server/environments/e_production.go
Backend SSO components/backend/handlers/sso.go, server/server.go
Frontend OIDC components/frontend/src/lib/oidc.ts, session.ts, auth.ts
Overlay (SSO PoC) components/manifests/overlays/hcmais/jsell-sso-poc/
Overlay (Keycloak) components/manifests/overlays/hcmais/keycloak/
RBAC base/rbac/backend-clusterrole.yaml, base/rbac/control-plane-*

Default behavior

  • make kind-up deploys with legacy auth (SA token, no Keycloak redirect)
  • make kind-sso-toggle enables Keycloak OIDC for both frontend and backend
  • Existing deployments using oauth-proxy are unaffected — SSO code paths only activate with SSO_ENABLED=true or the sso-authentication Unleash flag
  • API server falls back to sso.redhat.com JWKS if JWK_CERT_URL is not set

Native SSO deployment (validated)

The end-to-end path without legacy components was validated on the hcmais cluster:

acpctl → Keycloak auth → API Server (REST) → DB
                                ↓ gRPC event
                          Control Plane → Runner Pod → Claude response
                                ↑ gRPC messages
                          Runner → API Server → acpctl

Prerequisites for a new environment:

  1. Keycloak with 3 clients: ambient-frontend, ambient-control-plane (service account), ambient-cli (public)
  2. Identity brokering with RH SSO (optional, for RH user federation)
  3. Secrets: sso-credentials, control-plane-oidc, ambient-vertex
  4. See workflows/deploy-native-sso.md for the complete checklist

Test plan

  • Frontend login via Keycloak → session cookie → JWT forwarded to backend
  • Backend validates JWT, impersonates user, RBAC enforced
  • API key fallback via TokenReview still works
  • Token refresh works silently
  • Session expired dialog appears on refresh token expiry
  • Logout destroys session + Keycloak single sign-out
  • make kind-sso-toggle switches between SSO and legacy mode
  • Legacy mode (SA token auth) works when SSO is off
  • E2E token extraction via Keycloak client_credentials
  • Native SSO: acpctl login via Keycloak → create project → create session → runner pod → Claude response
  • Native SSO: control plane gRPC watch streams connected to api-server
  • Native SSO: runner authenticates via CP token server, pushes/watches session messages
  • Migration ordering: fresh Kind database bootstraps without errors
  • Migration idempotency: existing databases handle re-run safely

🤖 Generated with Claude Code

Define desired state for migrating from OpenShift OAuth proxy to direct
SSO/JWT authentication. Key decisions:

- BFF pattern: Next.js as OIDC confidential client, browser gets session cookie
- K8s impersonation: backend SA + Impersonate-User/Group preserves RBAC
- Dual-path auth: JWT first, TokenReview fallback for API keys
- Feature-flagged migration for incremental rollout
- Supersedes ADR-0002 (raw token passthrough → impersonation)

Includes migration workflow with consumer impact map and implementation notes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 12, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit f23c8bc
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/6a0f5cdb6620ee0008d6d131

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3042af88-9982-4d4f-8d0b-f2c62d7d04ea

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jsell/spec/sso-authentication
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch jsell/spec/sso-authentication

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

jsell-rh and others added 27 commits May 12, 2026 15:05
Reference the IAM consolidation proposal (PR #1466) as the long-term
direction. This spec is Phase 1; future phases cover API keys → SSO
service accounts, runner → OIDC token exchange, DB RBAC reconciler,
and credential consolidation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… requirements

- OIDC callback must coexist with existing integration auth routes
- SSO client configuration requirements (one per environment, audience isolation)
- Post-logout redirect URI and web origins specified

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kind/local-dev environments include Keycloak with pre-configured realm
- Replaces static JWKS ConfigMap, DISABLE_AUTH mock mode, and OC_TOKEN
- Same JWT validation code path as production (no dev-only auth logic)
- Realm config version-controlled as JSON export
- E2E tests use local Keycloak in Kind environments
- Design decision: Keycloak Identity Brokering for deployed environments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Keycloak Deployment, realm JSON, and env var config for Kind overlay
- Maps what it replaces (static JWKS, DISABLE_AUTH, test-user SA)
- Identity Brokering section for deployed environments
- Updated manifest changes to include Kind overlay additions/removals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d impersonation RBAC

Slice 1 of SSO authentication migration (Phase 1):

- Deploy Keycloak to Kind cluster with pre-configured realm (ambient-code)
  including confidential frontend client, public CLI client, and E2E
  client_credentials client. Dev users: developer/developer, admin/admin.
- Add jwtauth package with JWKS-based JWT validation using lestrrat-go/jwx/v2.
  Validates signature, expiration, issuer, and audience. Extracts OIDC claims
  (sub, email, preferred_username, groups).
- Add impersonate verb on users, groups, and serviceaccounts to backend-api
  ClusterRole for K8s impersonation under SSO auth.
- Fix Kind overlay: relax runAsNonRoot for ambient-api-server, make
  control-plane OIDC env vars optional.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mpersonation

Slice 2 of SSO authentication migration (Phase 1):

- Wire JWT validation into backend middleware: forwardedIdentityMiddleware
  validates JWT against Keycloak JWKS, extracts identity from OIDC claims
  (sub, email, preferred_username, groups), and stores validated claims
  in Gin context for reuse by handlers.
- Add dual-path auth in getK8sClientsDefault: JWT validation first, then
  TokenReview fallback for API keys (K8s ServiceAccount tokens).
- Use K8s impersonation (Impersonate-User/Group) instead of raw bearer
  token when SSO is enabled. Backend SA token + impersonation preserves
  all existing RBAC enforcement.
- Fix SSAR cache key to include impersonated identity instead of shared
  SA token, preventing cross-user authorization cache leaks.
- Gate SSO path behind "sso-authentication" Unleash feature flag.
- Add SSO env vars (SSO_ISSUER_URL, SSO_AUDIENCE) to backend Kind overlay.
- Fix Keycloak realm: add audience mapper and protocol mappers for sub,
  email, preferred_username claims in access token.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slice 3 of SSO authentication migration (Phase 1):

- Add openid-client v6, iron-session v8, and jose as dependencies
- Create OIDC client layer (src/lib/oidc.ts): discovery, authorization URL
  construction with PKCE, code exchange, token refresh, end-session URL
- Create encrypted session cookie management (src/lib/session.ts):
  iron-session with httpOnly/secure/sameSite cookies, transparent token
  refresh when access token is within 60s of expiry
- Add SSO API routes:
  - /api/auth/sso/login: generates PKCE, stores verifier/state in cookies,
    redirects to Keycloak authorization endpoint
  - /api/auth/sso/callback: exchanges code for tokens, stores in session
  - /api/auth/sso/logout: destroys session, redirects to Keycloak logout
- Add Next.js middleware: redirects unauthenticated page requests to SSO
  login when SSO_ENABLED=true
- Modify buildForwardHeadersAsync: SSO path extracts JWT from session,
  sets Authorization: Bearer and X-Forwarded-* headers from JWT claims.
  All 97+ consumers are unaffected.
- Update navigation logout to use SSO logout route when enabled
- Update /api/me to accept Authorization header for auth check
- Add SSO env vars to Kind frontend deployment patch
- Support SSO_PUBLIC_ISSUER_URL for Kind dev (browser vs cluster URLs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keycloak supports any port for localhost redirect URIs per RFC 8252
section 7.3. Registering http://localhost/* (without port) accepts
callbacks on any ephemeral port, eliminating port-forward mismatches.

Also set webOrigins to "+" (all valid redirect origins) for CORS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openid-client v6 requires a standard URL instance (not NextURL).
Construct callback URL from SSO_REDIRECT_URI base to match the
redirect_uri sent during authorization, since request.url inside
the container resolves to 0.0.0.0:3000 rather than localhost:11646.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In Kind, Keycloak's iss response parameter uses the public URL
(localhost:30090) while openid-client validates against the internal
URL (keycloak-service:8080). Remap the iss param before passing to
authorizationCodeGrant so RFC 9207 issuer validation passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set KC_HOSTNAME to the internal service URL so Keycloak uses a
consistent issuer in all tokens and OIDC responses, regardless of
whether the browser reaches it via localhost:30090 or the server
reaches it via keycloak-service:8080. This eliminates issuer
mismatches in ID token validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In Kind, the browser reaches Keycloak via localhost:30090 while backend
and frontend servers use keycloak-service:8080. Keycloak sets the token
issuer based on the authorization session URL, causing mismatches.

Fixes:
- Add alt issuer support to JWT validator (AddAltIssuer) so the backend
  accepts tokens from both internal and public Keycloak URLs. Production
  environments use a single URL and don't need alt issuers.
- Use standard openid-client authorizationCodeGrant in production (full
  ID token validation). Fall back to manual token exchange in dev when
  SSO_PUBLIC_ISSUER_URL differs from SSO_ISSUER_URL.
- Set cookies directly on redirect response in login route (cookies()
  API mutations don't transfer to NextResponse.redirect).
- Derive post-login redirect origin from SSO_REDIRECT_URI to avoid
  container-internal 0.0.0.0:3000 address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ints

The OIDC discovery config was cached as a module-level singleton with no
expiry. If Keycloak restarted and got a new ClusterIP, token refresh
calls would fail silently (ECONNREFUSED) and the session would be
destroyed, logging the user out.

Add a 5-minute TTL so the config is re-discovered periodically. This
matches the Keycloak JWKS cache interval and ensures endpoint URLs
stay current after dependency restarts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getUserSubjectFromContext now prefers userEmail (matching the
Impersonate-User header) when creating RoleBindings. Previously it
used userName (preferred_username), causing a mismatch: the
RoleBinding subject would be "developer" but impersonation would
use "developer@local.dev", so RBAC checks would fail.

This ensures lazy RoleBinding creation in CreateProject works
correctly with SSO impersonation — no manual RoleBindings needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Callback route: redirect to /api/auth/sso/login instead of showing
  JSON error when OIDC state cookies are missing or exchange fails.
  Handles stale Keycloak sessions that skip the login page.
- Logout route: derive post-logout redirect URI from SSO_REDIRECT_URI
  to avoid 0.0.0.0:3000 container address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NEXT_PUBLIC_* env vars are inlined at build time in Next.js client
components, so they're unavailable when the image is built without
them. Instead, expose ssoEnabled from the /api/me server route and
read it in the navigation component via useCurrentUser().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The project layout had its own handleLogout hardcoded to /oauth/sign_out,
separate from the main navigation. Unified both to use the runtime
ssoEnabled flag from useCurrentUser().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slice 4 of SSO authentication migration (Phase 1):

- Update extract-token.sh to obtain JWT from Keycloak via
  client_credentials grant (ambient-e2e client). Falls back to K8s
  SA token when Keycloak is not available.
- Add audience and sub protocol mappers to ambient-e2e Keycloak client
  so tokens have proper aud claim for backend validation.
- Add ClusterRoleBinding for e2e service account identity
  (service-account-ambient-e2e) so E2E tests can access projects.
- No developer RoleBindings — JIT provisioning via CreateProject
  handles first-time access correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the OIDC session expires and token refresh fails, the user now
sees a blocking dialog instead of silent 401 errors:

- Global 401 detection via QueryCache/MutationCache onError handlers
- Skip retries on 401 to prevent request storms against the IdP
- Non-dismissable AlertDialog with "Log in" button that preserves
  returnTo path so users land back on the same page
- No "expiring soon" warning — server-side refresh handles access
  token renewal transparently; only surfaces when refresh token dies

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add logging to getAccessToken so token refresh attempts and failures
  are visible in pod logs (was silently swallowing errors).
- Fix middleware to return 401 JSON for RSC/fetch requests instead of
  redirecting to Keycloak. Cross-origin redirects fail as XHR and cause
  CORS errors. Full page navigations still redirect to login.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openid-client's refreshTokenGrant validates the ID token iss claim
in the refresh response, which fails when the token was issued by
localhost:30090 but the refresh goes through keycloak-service:8080.

Use manual fetch to the token endpoint in split-URL mode (same
approach as code exchange). Production uses the library's standard
refreshTokenGrant with full validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split-URL problem (browser→localhost:30090, server→keycloak-service:8080)
caused token issuer mismatches that broke refresh tokens and ID token
validation. Every workaround added complexity.

Root fix: proxy Keycloak through the frontend at /sso/* so browser and
server both reach Keycloak through the same origin. Combined with
KC_HOSTNAME=http://keycloak-service:8080, all tokens now have a
consistent issuer that matches the discovery endpoint.

Changes:
- Add /sso/[...path] catch-all route that proxies to Keycloak,
  rewriting Location headers on redirects
- Set KC_HOSTNAME to internal service URL for consistent token issuer
- Update SSO_PUBLIC_ISSUER_URL to use the proxy path
- Exclude /sso from auth middleware matcher
- Remove unused next.config.js rewrites (build-time, not runtime)

This eliminates: alt issuers on the backend, manual token exchange
fallbacks, iss parameter remapping in callbacks, and CORS errors on
session expiry redirects. Production deployments use a single URL
and don't need the proxy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates the split-URL issuer mismatch by properly configuring
Keycloak's hostname-backchannel-dynamic feature:

- KC_HOSTNAME=http://localhost:11646/sso — all tokens use the
  public URL as issuer, login pages render with proxy URLs
- KC_HOSTNAME_BACKCHANNEL_DYNAMIC=true — internal services get
  backchannel URLs (token_endpoint, jwks_uri) via keycloak-service:8080

Frontend changes:
- Manual OIDC discovery to bypass openid-client v6's issuer validation
  (known issue: github.com/panva/openid-client/issues/737)
- Remove all split-URL workarounds (manual token exchange, iss
  remapping, URL rewriting in auth/logout/refresh)
- openid-client's standard authorizationCodeGrant and refreshTokenGrant
  now work correctly for all flows

Backend changes:
- JWT validator uses discovered issuer from OIDC metadata (not the
  discovery URL) so it accepts the public issuer in tokens

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Frontend README: replace OC_TOKEN/OAuth proxy header docs with SSO
  env vars and OIDC session model description
- Frontend .env.example: add SSO_* vars, move OC_* to legacy section
- Backend README: replace DISABLE_AUTH migration guide with Keycloak
  dev auth instructions (JWT and SA token examples)
- E2E README: update quick start to use extract-token.sh (Keycloak
  client_credentials with K8s SA fallback)
- Kind dev guide: add Keycloak to bootstrap steps, document dev
  credentials and session lifetimes
- CONTRIBUTING.md: add Keycloak to kind-up description, update
  access instructions with login info
- OPENSHIFT_OAUTH.md: mark as legacy, link to SSO spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…facing

The sso-authentication Unleash flag controls which auth path the
backend uses. It is not visible in workspace settings and is not
user-configurable — ops enables it per-environment during migration.
Kind dev cluster creates and enables it automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jsell-rh and others added 10 commits May 19, 2026 10:45
Use client_id instead of id_token_hint for the end-session URL when
the id_token is not available in the session (dropped to stay under
the 4KB cookie size limit).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…flag

OverrideConfig previously hardcoded the JWKS URL to sso.redhat.com,
ignoring both --jwk-cert-url flag and environment variables. This
makes it impossible to use a different OIDC provider (e.g. Keycloak)
without a code change.

Priority is now: JWK_CERT_URL env var > --jwk-cert-url flag > default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds complete workflow for deploying Ambient with native Keycloak SSO,
bypassing the legacy oauth-proxy and agentic-operator. Includes
overlay for jsell-sso-poc namespace on hcmais cluster with all
patches for api-server, control plane, and RBAC.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SSO_ENABLED env var so SSO works without Unleash feature flag
- Prefer preferred_username over email for k8s impersonation (matches
  Keycloak identity brokering claim order)
- Default to system:authenticated group when JWT has no groups
- Add debug logging for JWT validation failures
- Fix roleBindings migration ID to avoid conflict

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Organize overlays by cluster. hcmais/ now contains both the Keycloak
deployment and the jsell-sso-poc ambient deployment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bases

The ID was changed for a PoC database conflict. Reverting to the
original ID so production databases that already ran this migration
aren't affected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The integration test package had a TestMain (server bootstrap) but
zero test functions. Go test treats this as a failure in verbose mode.
Added a minimal test that validates the server starts successfully.

Pre-existing issue since 09b88a4, not caused by this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@markturansky
Copy link
Copy Markdown
Contributor

Review — feat(security): SSO/JWT authentication migration (Phase 1)

Solid implementation overall. The BFF pattern (Next.js as confidential OIDC client, browser never sees a raw JWT) is the right call, and the dual-path auth (JWT → TokenReview fallback) makes incremental rollout safe. A few things worth a second look before merge:


🔴 Migration ID — verify final state

The commit history shows three conflicting commits on the roleBindings migration ID:

  1. ID changed 202505130001 → 202603100139 (to sort after the table-creation migration)
  2. Then reverted back to 202505130001
  3. Then that revert was itself reverted — net effect: 202603100139 lands

The PR description says "Idempotent — safe for existing databases where the old ID already ran" but does not explain the revert-of-revert. If existing prod databases already have 202505130001 in the migrations table, and this PR ships 202603100139, the migration will run again on those databases. Confirm this is intentional and that the migration body is truly idempotent.


🟡 e2e-login route — production guard needs hardening

/api/auth/sso/e2e-login accepts a raw token and injects a session cookie, guarded only by E2E_TEST_HELPERS env var. If that var is ever set on a production pod (misconfiguration, copy-paste from a test overlay), it becomes an auth bypass with no other checks. Consider:

  • Adding an explicit process.env.NODE_ENV !== 'production' guard in addition to E2E_TEST_HELPERS, or
  • Documenting clearly in the overlay that E2E_TEST_HELPERS=true is forbidden in non-test namespaces and adding a CI lint rule to catch it.

🟡 preferred_username vs email for impersonation — verify consistency

Two commits pull in opposite directions:

  • Main impl: prefers preferred_username for Impersonate-User
  • Fix commit 77548d6: getUserSubjectFromContext should use userEmail when creating RoleBindings to match the impersonation identity

The SSAR cache key fix (ssar_cache.go) that patches the cross-user cache leak is critical and correct — just make sure the field used for the cache key (userEmail or preferred_username) matches exactly what's in the Impersonate-User header across all code paths. A mismatch here would re-introduce the cache leak.


🟡 JWK cert URL priority

JWK_CERT_URL env var > --jwk-cert-url flag > default — env var overriding an explicit CLI flag is non-standard and surprising. Operators passing --jwk-cert-url at deploy time would have the env var silently win. Conventional priority is flag > env var > default.


✅ Strengths worth calling out

  • SSAR cache key fix (cross-user authorization cache leak) — critical bug, correctly fixed
  • OIDC discovery TTL (5 min) prevents stale endpoints after Keycloak restarts
  • Keycloak proxy through frontend (/sso/*) cleanly eliminates the split-URL issuer mismatch — good final design decision
  • E2E runs both legacy and SSO modes in CI — strong coverage
  • SSO_ENABLED env var bypass of Unleash is the right escape hatch for native deployment

Review posted by lead agent (markturansky/hcmai-01)

…I spec

The OpenAPI spec defines credentials under /projects/{id}/credentials/
but the route registration only had /credentials/. The generated SDK
client calls the project-scoped path, causing all credential
integration tests to 404.

Registers the same handlers under both paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@markturansky
Copy link
Copy Markdown
Contributor

CI Failure — plugins/credentials integration tests

Following up on my earlier review: the current CI run has a failing check that appears to confirm the migration ordering concern I raised.

Job: API Server Integration Testsview run

Failing tests (all in plugins/credentials package):

  • TestCredentialGet, TestCredentialPost, TestCredentialPatch, TestCredentialPaging, TestCredentialListSearch, TestCredentialListTokenOmitted, TestCredentialToken, TestCredentialDelete

Error (from logs):

integration_test.go:65:
    Error posting object: 404 Not Found
    body: {"reason":"The requested resource '/api/ambient/v1/projects/test-project/credentials' doesn't exist"}

This is an API-level 404, not an HTTP routing miss — the route exists, but the prerequisite project (test-project) doesn't exist when the credentials tests run. The server starts cleanly (TestServerStarts passes), so this is a test setup / migration ordering issue.

Likely cause: The roleBindings migration ID change (202505130001 → 202603100139) means the migration now runs after all 202505... migrations. If test-project creation depends on the roleBindings migration having run, and that migration is now delayed in the ordering, the credentials plugin tests fail because the project prerequisite is never satisfied.

The fix(api-server): add placeholder integration test to fix CI commit (most recent) fixed the TestMain-with-no-tests issue but didn't address this underlying failure in the credentials package.

Suggested fix: Either confirm the migration ordering is correct for a fresh test database, or add an explicit t.Skip / pre-condition check in the credentials integration tests that creates/verifies the test project before running credential operations.

Lead agent — hcmai-01

…obber

Kustomize's namespace transformer rewrites all subject namespaces in
ClusterRoleBindings, which breaks the ambient-code namespace bindings
when deploying to a different namespace. Move CRBs to a separate file
applied independently of kustomize.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jsell-rh
Copy link
Copy Markdown
Contributor Author

Review response

Thanks for the thorough review @mturansk. All points addressed in e89e44ae:

🔴 Migration ID

Verified against production DB. The ambient-code staging RDS has 202505130001 already in the migrations table. Shipping 202603100139 means gormigrate runs it as a "new" migration, but every statement is idempotent (IF EXISTS/IF NOT EXISTS/DROP NOT NULL). All operations no-op. The migrations table will have both IDs — redundant but harmless. Updated PR description with this evidence.

🟡 e2e-login hardening

Added AMBIENT_ENV === "production" guard in addition to E2E_TEST_HELPERS. Both conditions must be met for the route to be disabled — even if E2E_TEST_HELPERS leaks into a production overlay, AMBIENT_ENV=production blocks it.

🟡 preferred_username vs email consistency

Aligned getUserSubjectFromContext claim priority to match buildImpersonatingClients: preferred_username > email > sub. The SSAR cache key uses claims.Sub (set at server.go:232), which is a stable per-user identifier regardless of which claim is used for impersonation — no cache leak.

🟡 JWK cert URL priority

Fixed to conventional ordering: CLI flag > env var > default.

CI failures

Both were pre-existing:

  • API Server Integration Tests: TestMain with zero test functions (added placeholder, then also fixed the credential route mismatch — routes now registered at both /credentials/ and /projects/{id}/credentials/ to match OpenAPI spec)
  • E2E scheduled-sessions: flaky OOTB workflow test, passes on re-run, unrelated to our changes

- e2e-login: add AMBIENT_ENV=production guard alongside E2E_TEST_HELPERS
  to prevent accidental auth bypass in production
- impersonation: align getUserSubjectFromContext claim order with
  buildImpersonatingClients (preferred_username > email > sub) so
  RoleBinding subjects match the impersonated identity
- JWK_CERT_URL: fix priority to CLI flag > env var > default (standard
  convention, per review feedback)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jsell-rh jsell-rh force-pushed the jsell/spec/sso-authentication branch from e89e44a to 084e113 Compare May 21, 2026 18:25
Copy link
Copy Markdown
Contributor

@markturansky markturansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amber Code Review — feat(security): SSO/JWT authentication migration (Phase 1)

Summary

Substantial, well-structured SSO migration. The core auth path — JWKS-based JWT validation, K8s impersonation, dual-path (JWT + TokenReview) fallback, and encrypted httpOnly session cookie — is architecturally sound and the approach is correct. One critical security flaw in the OIDC callback (open redirect) must be fixed before merge, along with a few major issues around the Keycloak proxy surface and committed credential placeholders.


Findings

Critical

Open redirect in OIDC callbackcomponents/frontend/src/app/api/auth/sso/callback/route.ts

const returnTo = cookieStore.get("oidc_return_to")?.value || "/";
// ...
return NextResponse.redirect(new URL(returnTo, origin));

new URL(absoluteUrl, base) ignores base when the first argument is already absolute. An attacker can craft /api/auth/sso/login?returnTo=https://evil.com; the server stores it in the httpOnly oidc_return_to cookie; after successful OIDC login the user is silently redirected to evil.com. Classic phishing vector.

Fix: validate returnTo is a relative path before storing/using it:

function safeReturnTo(value: string | undefined): string {
  if (!value) return "/";
  try {
    const url = new URL(value, "http://localhost");
    // Reject if the input was already absolute (has its own host)
    if (new URL(value, "http://localhost").origin !== "http://localhost") return "/";
    return url.pathname + url.search;
  } catch { return "/"; }
}

Apply in both login/route.ts (before storing to cookie) and callback/route.ts (before using from cookie).


Major

1. CHANGE_ME placeholders committed to cluster overlaycomponents/manifests/overlays/hcmais/jsell-sso-poc/

sso-credentials.yaml has SSO_CLIENT_SECRET: CHANGE_ME, SESSION_SECRET: CHANGE_ME. keycloak-admin-secret.yaml has password: CHANGE_ME. keycloak-db.yaml has POSTGRES_PASSWORD: CHANGE_ME. keycloak-realm.json has "clientSecret": "CHANGE_ME" for both the frontend client and the RH SSO identity provider.

These are in a cluster-specific overlay intended for production (hcmais). If applied without rotation the session cookie encryption key is trivially known. Should either:

  • Add a # REQUIRED: rotate before applying comment block and a pre-flight check in the deployment workflow
  • Or use ExternalSecret / SealedSecret so real values are never committed

The Kind overlay correctly uses dev-secret-do-not-use-in-prod labels and is isolated to dev. The hcmais overlay needs the same level of care.

2. Unrestricted Keycloak proxycomponents/frontend/src/app/sso/[...path]/route.ts

This route proxies all GET and POST requests to SSO_ORIGIN (Keycloak) with no path allowlisting. It exposes every Keycloak HTTP endpoint to the public internet via the frontend, including the admin REST API (/auth/admin/...), token introspection, user endpoints, etc. Keycloak has its own auth on admin routes, but this needlessly widens the attack surface and makes ingress WAF rules harder.

Suggested: restrict to paths the OIDC flow actually needs:

const ALLOWED_PATHS = /^(\/realms\/[^/]+\/(protocol\/openid-connect|\.well-known))/;
if (!ALLOWED_PATHS.test(`/${path}`)) {
  return NextResponse.json({ error: "Not found" }, { status: 404 });
}

3. E2E auth bypass reachable with SSO enabled on Kindcomponents/manifests/overlays/kind/frontend-test-patch.yaml

E2E_TEST_HELPERS=true is set unconditionally in the Kind test patch. The e2e-login route's second guard (AMBIENT_ENV === "production") is only meaningful if production overlays explicitly set AMBIENT_ENV=production on the frontend deployment. The Kind overlay doesn't set it, so after make kind-sso-toggle (SSO on), the auth bypass endpoint is live at /api/auth/sso/e2e-login for anyone who can reach the Kind cluster and knows the route exists.

Dev-only cluster, so production impact is nil. But a developer running Kind SSO mode with external access (e.g., port-forwarded to a shared network) is exposed. Consider removing E2E_TEST_HELPERS from the standard Kind patch and requiring it to be set via a separate make kind-e2e-mode toggle, or add AMBIENT_ENV=development to the route guard so the check is explicit both ways.


Minor

4. Module-level OIDC config cachecomponents/frontend/src/lib/oidc.ts:5-9

cachedConfig and cachedAt are module-level variables. In serverless / Next.js edge deployments, module state is not guaranteed to persist across invocations, which could cause repeated OIDC discovery on every request. Consider using lru-cache or an explicit cache tied to the Next.js request lifecycle, or document that this is intentional for long-lived Node.js processes only.

5. Silent session destroy on missing refresh tokencomponents/frontend/src/lib/session.ts:60-63

When the access token is expired and there's no refresh token, session.destroy() is called silently. No log entry makes it hard to debug "why did my session disappear?" Add: console.warn("SSO: session expired with no refresh token, destroying").

6. Kind Keycloak SecurityContextcomponents/manifests/overlays/kind/keycloak-deployment.yaml:16

runAsNonRoot: false for Kind Keycloak and Kind api-server (kind/api-server-security-patch.yaml). CLAUDE.md mandates runAsNonRoot on all containers. Dev-only, so no prod risk, but worth noting the deliberate divergence with an inline comment explaining why Keycloak requires it (# Keycloak upstream image requires root; acceptable in Kind dev only).

7. Empty smoke testcomponents/ambient-api-server/test/integration/integration_test.go:258-261

TestServerStarts has no assertions — the comment explains the intent, but this pattern is fragile: a future refactor that skips TestMain would leave a silently-passing test. Consider require.NotNil(t, helper.AppConfig()) or equivalent to make the assertion explicit.


Positive Highlights

  • JWT validator is well-built: JWKS cache with 5-min min-refresh, proper issuer/audience/expiration validation, manual issuer check to support alt-issuers (dev vs. public URL). The validator_test.go is thorough — expired, tampered signature, wrong issuer/audience, malformed, empty, minimal claims — all covered.
  • K8s impersonation approach is the right call: no cluster OIDC federation required, all existing RBAC bindings continue to work, SA tokens (API keys) accepted via TokenReview fallback. Identity claim resolution order (preferred_username > email > sub) is consistent across buildImpersonatingClients, setIdentityFromClaims, and getUserSubjectFromContext.
  • SSAR cache key improved from raw bearer token to OIDC sub — reduces cache churn on token rotation and is more correct semantically.
  • InitJWTValidator non-fatal design: server starts normally without SSO configured; the feature flag is also off in that state. Clean.
  • Session cookie correctly configured: httpOnly, SameSite=lax, encrypted via iron-session.
  • PKCE+state implemented correctly for Authorization Code Flow.
  • Migration ID change well-reasoned and idempotency analysis is rigorous.
  • forwardedIdentityMiddleware falls through to legacy header-based path on JWT failure, preserving backward compatibility.

Recommendations (priority order)

  1. Fix open redirect in callback/route.ts and login/route.ts — validate returnTo is relative before storing/using. (~10 lines)
  2. Add path allowlist to sso/[...path]/route.ts — block everything except /realms/*/protocol/openid-connect/* and /.well-known/*.
  3. Address CHANGE_ME placeholders in hcmais/jsell-sso-poc/ — add deployment workflow gate or switch to ExternalSecret/SealedSecret.
  4. Document / isolate E2E_TEST_HELPERS — avoid having it enabled by default on the Kind overlay when SSO is toggled on.
  5. Items 4–7 are quality improvements that can be addressed in follow-up tickets.

Review by Amber · standards refs: specs/standards/security/security.spec.md, skills/amber-review/SKILL.md, CLAUDE.md

Critical:
- Fix open redirect in OIDC callback — validate returnTo is a relative
  path before storing in cookie and before redirecting. Absolute URLs
  (e.g. https://evil.com) are rejected and replaced with "/".

Major:
- Add path allowlist to Keycloak proxy — only OIDC-required paths
  (realms/*/protocol/openid-connect, .well-known, login-actions,
  resources) are proxied. Admin API and other endpoints return 404.
- Add warning comments to CHANGE_ME placeholders in sso-credentials.yaml
  pointing to the deployment workflow.

Minor:
- Add console.warn when session is destroyed due to missing refresh token
- Add real assertion to TestServerStarts (verify helper initialized and
  server URL is non-empty)
- Add inline comment explaining runAsNonRoot:false on Kind Keycloak

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jsell-rh
Copy link
Copy Markdown
Contributor Author

Response to Amber code review (#4339633631)

All findings addressed in 1d194b00:

Critical — Open redirect

Fixed. Added safeReturnTo() validation in both login/route.ts and callback/route.ts. Absolute URLs are rejected and replaced with /. Validates by parsing with a localhost base and checking the origin matches.

Major 1 — CHANGE_ME placeholders

Added warning comment block pointing to workflows/deploy-native-sso.md Step 3a.

Major 2 — Keycloak proxy allowlist

Added ALLOWED_PATH_PREFIX regex restricting to realms/*/protocol/openid-connect, .well-known, login-actions, and resources. Everything else returns 404.

Major 3 — E2E auth bypass

Already addressed in prior commit — added AMBIENT_ENV === "production" guard. The Kind concern (dev cluster exposure) is valid but low-risk; noted for follow-up.

Minor items

  • OIDC config cache: Intentional for long-lived Node.js processes. Serverless edge is not a current target.
  • Silent session destroy: Added console.warn before session.destroy().
  • Kind SecurityContext: Added inline comment explaining the deliberate divergence.
  • Empty smoke test: Added real assertions — verifies testHelper initialized and RestURL returns non-empty.

Copy link
Copy Markdown
Contributor

@markturansky markturansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: SSO/JWT Authentication Migration (Phase 1)

Holistic re-review after fixup commit 1d194b00. Previous review posted 2026-05-21.


Fixes verified ✅

All six items from the initial review were addressed in the fixup commit. Each one is correctly implemented:

Open redirect in callback/login routessafeReturnTo() now resolves all inputs against http://localhost and rejects anything whose origin differs from that base. Tested mentally against the classic attack vectors: https://evil.com → origin mismatch → /; //evil.com/path → resolves to http://evil.com/path → origin mismatch → /; javascript:alert(1) → URL parse throws → catch → /; /projects/foo → same origin → /projects/foo. Fix is correct.

Keycloak proxy path allowlist — regex /^realms\/[^/]+\/(protocol\/openid-connect|\.well-known|login-actions|resources)\b/ blocks everything except the four legitimate Keycloak endpoint families. Correct.

E2E auth bypass double guard — now requires both E2E_TEST_HELPERS === "true" and AMBIENT_ENV !== "production" before the bypass activates. Returns 404 otherwise. Correct.

CHANGE_ME comment on sso-credentials.yaml — header comment added, clearly marking placeholder values and linking to the deploy workflow. Correct.

runAsNonRoot in Kind overlay — inline comment explains why root is acceptable for the Keycloak upstream image in a dev-only context. Correct.

Integration test guardTestServerStarts now fails fast with a clear message if TestMain didn't initialize the helper. Correct.


Remaining gaps ⚠️

1. CHANGE_ME comment missing from three sibling overlay files (minor → should fix)

sso-credentials.yaml got the # REQUIRED: Replace CHANGE_ME values before deploying header, but the other three files in the same overlay directory did not:

  • components/manifests/overlays/hcmais/jsell-sso-poc/keycloak-admin-secret.yaml
  • components/manifests/overlays/hcmais/jsell-sso-poc/keycloak-db.yaml
  • components/manifests/overlays/hcmais/jsell-sso-poc/keycloak-realm.json

All three contain literal placeholder strings that would break a real deployment if applied as-is. Same fix needed: add the required-replace comment at the top of each file.

2. E2E_TEST_HELPERS=true unconditional in Kind patch (minor → should fix)

components/manifests/overlays/kind/frontend-test-patch.yaml still sets E2E_TEST_HELPERS=true unconditionally. The E2E route is now double-guarded in code, but anyone running the Kind overlay gets an always-on test-helper surface. This should either be removed from the base Kind patch and injected only in CI test runs, or at minimum carry a comment explaining the intent. As it stands, any developer who applies the Kind overlay locally activates the bypass endpoint.


New bug: session.idToken is never stored 🐛 (must fix before merge)

logout/route.ts reads session.idToken to build the OIDC end_session_endpoint URL:

// logout/route.ts
const endSessionUrl = await getEndSessionUrl(
  session.idToken,   // ← always undefined
  redirectUrl,
);

getEndSessionUrl uses idToken as the id_token_hint parameter, which tells the identity provider which session to terminate. If it is undefined, the function falls back to passing only client_id, which means Keycloak may not terminate the upstream SSO session — only the local cookie is destroyed. Users will find themselves re-logged-in silently the next time they visit, which defeats single sign-out.

The root cause is in callback/route.ts. exchangeCode() returns idToken, but the callback only persists three of the four fields:

// callback/route.ts (current)
const tokens = await exchangeCode(code, codeVerifier, redirectUri);
session.accessToken  = tokens.accessToken;
session.refreshToken = tokens.refreshToken;
// session.idToken is dropped here   ← bug
session.expiresAt    = tokens.expiresAt;
await session.save();

Fix is one line:

session.idToken      = tokens.idToken;

This must be added before session.save() in the callback route.


Style nit (optional)

safeReturnTo() is copy-pasted identically into both login/route.ts and callback/route.ts. It should live in a shared module (e.g. src/lib/auth-utils.ts) and be imported from both routes. Not a blocker, but worth cleaning up before this ships to production to avoid the two copies drifting.


Verdict

The fixup commit is a solid response to the initial review — all six critical and major issues are correctly addressed. Two minor gaps remain (CHANGE_ME comments on sibling files, unconditional E2E_TEST_HELPERS in Kind patch), and there is one new must-fix bug: session.idToken is never persisted to the iron-session cookie, which silently breaks OIDC single sign-out.

Request changes on the idToken omission. The other items can be addressed in the same commit. Happy to re-review once those three lines are in.

…E_ME

Must-fix:
- Persist idToken to session cookie in OIDC callback so logout can
  pass id_token_hint to Keycloak end_session_endpoint (single sign-out)

Cleanup:
- Extract safeReturnTo() to shared lib/auth-utils.ts, import from
  both login and callback routes
- Add CHANGE_ME warnings to keycloak-admin-secret.yaml, keycloak-db.yaml,
  and keycloak kustomization.yaml (keycloak-realm.json is JSON, noted
  in kustomization instead)
- Add security comment to E2E_TEST_HELPERS in Kind overlay

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jsell-rh
Copy link
Copy Markdown
Contributor Author

Response to re-review

All items addressed in 05bb922e:

idToken bug (must-fix): Added session.idToken = tokens.idToken in the callback route. Single sign-out via id_token_hint will now work correctly.

CHANGE_ME on sibling files: Added warnings to keycloak-admin-secret.yaml, keycloak-db.yaml, and keycloak/kustomization.yaml (JSON can't have comments, so noted there instead).

safeReturnTo dedup: Extracted to src/lib/auth-utils.ts, imported from both routes.

E2E_TEST_HELPERS comment: Added security note to the Kind overlay explaining the double-guard and advising against shared-network exposure.

…bber

The sso-credentials.yaml contains CHANGE_ME placeholders. Including it
in kustomize resources causes oc apply -k to overwrite manually-set
secret values. Same pattern as the CRB namespace clobber fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@markturansky markturansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third Pass — Thorough Line-by-Line Review

Reviewed after commit 05bb922e. All three previous fixup commits now absorbed.


All prior issues confirmed resolved ✅

Every finding from reviews 1 and 2 is correctly addressed. I walked each file end-to-end below; I won't rehash what's already fixed.


New findings

MAJOR — hcmais Keycloak realm missing groups scope

File: components/manifests/overlays/hcmais/keycloak/keycloak-realm.json

The Kind realm correctly defines a realm-level groups clientScope with an oidc-group-membership-mapper, and all three clients include it in defaultClientScopes. The hcmais realm does neither:

Kind realm:    ambient-frontend defaultClientScopes: ['openid', 'email', 'profile', 'groups'] ✅
hcmais realm:  ambient-frontend defaultClientScopes: ['openid', 'email', 'profile']         ✗

There is no groups clientScope definition at the realm level and no per-client protocol mapper for group membership. Tokens issued by the hcmais Keycloak will have no groups claim.

In buildImpersonatingClients (handlers/sso.go):

groups := claims.Groups
if len(groups) == 0 {
    groups = []string{"system:authenticated"}
}

Every SSO user on hcmais will be impersonated as system:authenticated and nothing else. Any K8s RoleBinding bound to a specific group (e.g., an LDAP-synced group like ambient-users) will not be honoured. Users may see unexpected 403s on resources gated by group membership.

Fix: Add the groups clientScope to the hcmais realm, mirroring the Kind config:

"clientScopes": [
  {
    "name": "groups",
    "protocol": "openid-connect",
    "protocolMappers": [{
      "name": "groups",
      "protocolMapper": "oidc-group-membership-mapper",
      "config": {
        "full.path": "false",
        "access.token.claim": "true",
        "id.token.claim": "true",
        "claim.name": "groups"
      }
    }]
  }
]

And add "groups" to ambient-frontend.defaultClientScopes.


MAJOR — Identity claim precedence is inconsistent across the stack

There are three places that resolve "who is this user"; all three use a different priority order:

Site Priority order
buildImpersonatingClients (sso.go) preferred_usernameemailsub
setIdentityFromClaimsuserID (server.go) emailsub
setIdentityFromClaimsuserName (server.go) preferred_usernameemail

K8s impersonation uses preferred_username as the primary. The context userID uses email as the primary. For any user that has both claims (the common case), the K8s identity is their preferred_username (e.g. jsell) but the application userID is their email (e.g. jsell@redhat.com). These two values are used in different places — userID for secret key naming, credential lookup; K8s identity for RBAC. If any RBAC RoleBinding is authored with the email address as the subject, it would match K8s impersonation only when preferred_username is absent. If it is authored with the username, userID-keyed lookups won't match.

The root fix is to make setIdentityFromClaims use preferred_username first for userID, matching the impersonation order:

// Consistent with buildImpersonatingClients
primaryID := claims.PreferredUsername
if primaryID == "" {
    primaryID = claims.Email
}
if primaryID == "" {
    primaryID = claims.Sub
}
c.Set("userID", SanitizeUserID(primaryID))
c.Set("userIDOriginal", primaryID)

MINOR — SSO middleware falls through to X-Forwarded-* headers after JWT failure

File: components/backend/server/server.go, forwardedIdentityMiddleware

When SSO is enabled and JWT validation fails (e.g., the request carries an API key that is not a Keycloak JWT), the middleware logs the failure and falls through to the legacy OAuth proxy path:

// JWT validation failed — fall through to header-based extraction
// (API keys will be handled by getK8sClientsDefault via TokenReview)
// [falls through to reading X-Forwarded-User, X-Forwarded-Email, etc.]

The comment is correct — API key identity is re-established by setIdentityFromTokenReview inside getK8sClientsDefault, which overwrites whatever the middleware set. So in practice no handler sees wrong identity. But the middleware shouldn't trust X-Forwarded-* headers at all when SSO is the active auth mode — those headers belonged to the OpenShift OAuth proxy era and should be treated as untrusted in the SSO path.

The minimal fix is a guarded return:

if SSOEnabled() {
    // JWT failed AND we're in SSO mode — don't fall through to OAuth proxy headers.
    // Identity will be established by tokenReviewIdentity in getK8sClientsDefault.
    c.Next()
    return
}

This is defense-in-depth: it closes the theoretical path where a direct-to-backend caller sets crafted X-Forwarded-* headers alongside an API key token.


MINOR — OIDC authorization request only asks for openid scope

File: components/frontend/src/lib/oidc.ts, buildAuthorizationUrl

scope: "openid",

The code later depends on email and preferred_username claims. These are delivered by Keycloak because email and profile are in the client's defaultClientScopes, but the request doesn't ask for them explicitly. If the Keycloak admin ever changes the default scopes, the claims silently disappear and auth degrades in a hard-to-diagnose way. Should be:

scope: "openid email profile",

MINOR — getAccessToken() doesn't persist refreshed idToken

File: components/frontend/src/lib/session.ts

refreshOIDCTokens returns a new idToken (Keycloak always issues a new one on refresh). The refresh path saves accessToken, refreshToken, and expiresAt but drops idToken:

session.accessToken  = tokens.accessToken;
session.refreshToken = tokens.refreshToken;
// session.idToken not updated here
session.expiresAt    = tokens.expiresAt;

The old idToken from initial login continues to work as id_token_hint at logout (Keycloak accepts it), so this isn't broken today. But Keycloak rotates session keys on refresh, and some configurations invalidate old ID tokens. Add session.idToken = tokens.idToken to the refresh block for correctness.


MINOR — ambient-cli client lacks an audience mapper

File: components/manifests/overlays/hcmais/keycloak/keycloak-realm.json (and Kind equivalent)

The ambient-frontend client has a protocol mapper that adds ambient-frontend to the aud claim of access tokens. The ambient-cli client has no such mapper.

The backend validates JWT audience against SSO_AUDIENCE (which is ambient-frontend). A user who authenticates via the CLI client will receive a token without aud: ambient-frontend, and the backend will reject it with a validation error. If the CLI is intended to reach the backend in Phase 1 (or Phase 2), this needs an audience mapper on the CLI client. If the CLI is out of scope for now, add a comment in the realm JSON noting this.


MINOR — new URL(SSO_ISSUER_URL) at module init in Keycloak proxy

File: components/frontend/src/app/sso/[...path]/route.ts

const SSO_ORIGIN = process.env.SSO_ISSUER_URL
  ? new URL(process.env.SSO_ISSUER_URL).origin
  : null;

This executes at module load time. If SSO_ISSUER_URL is set to a malformed value (missing protocol, trailing space, etc.), new URL() throws a TypeError that propagates as an unhandled module-load error, crashing the Next.js worker. Wrap it:

function parseSSOOrigin(): string | null {
  try {
    return process.env.SSO_ISSUER_URL
      ? new URL(process.env.SSO_ISSUER_URL).origin
      : null;
  } catch {
    console.error('SSO_ISSUER_URL is not a valid URL:', process.env.SSO_ISSUER_URL);
    return null;
  }
}
const SSO_ORIGIN = parseSSOOrigin();

Nits

Keycloak proxy Location rewrite uses String.replace (first match only):

value.replace(SSO_ORIGIN, request.nextUrl.origin + "/sso")

String.replace with a string argument replaces only the first occurrence. Use replaceAll or a global regex. Unlikely to matter in practice (Location headers don't repeat the origin), but correct API is replaceAll.

hcmais realm client secret in keycloak-realm.json is CHANGE_ME:

If the realm JSON is imported into Keycloak as-is (which the --import-realm flag does on startup), the ambient-frontend Keycloak client secret will be set to CHANGE_ME, separate from the SSO_CLIENT_SECRET in the K8s secret. Both must be updated together. The kustomization now has a comment about this; consider adding a line specifically calling out the realm JSON client secret requirement: "secret": "<must match SSO_CLIENT_SECRET>".


Overall assessment

The core implementation is solid. The JWT validator, JWKS cache, K8s impersonation, BFF OIDC session model, and E2E bypass guards are all well-designed. The two major findings (groups scope gap and identity precedence inconsistency) should be addressed before this goes to production — they don't block the PoC deployment but will cause real RBAC failures when users with group-based access controls hit the system.

The five minor items are fixes, not blockers for the PoC. Requesting changes on the groups scope and identity precedence; everything else is advisory.

jsell-rh and others added 4 commits May 21, 2026 15:05
iron-session's cookie limit (4096 bytes) is exceeded when idToken is
included alongside accessToken and refreshToken — especially with
Keycloak identity brokering which produces large tokens.

Store idToken in a dedicated httpOnly oidc_id_token cookie. Logout
reads it from there for id_token_hint. Session type no longer includes
idToken.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redirect post-logout to /logged-out instead of app root (which
  triggers auto-redirect back to SSO, re-authenticating silently)
- Add /logged-out page excluded from SSO middleware
- Include ssoEnabled in /api/me response even when unauthenticated
  so the logout button always uses the correct path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s scope, proxy hardening

Align setIdentityFromClaims with buildImpersonatingClients claim order
(preferred_username > email > sub) so userID and K8s impersonation
identity are consistent. Guard X-Forwarded-* header fallthrough when
SSO is enabled. Add groups clientScope and audience mapper to both
Kind and hcmais Keycloak realms. Request explicit OIDC scopes, persist
refreshed idToken, and harden the Keycloak proxy init.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants